Non-parametric Bayesian Segmentation of Japanese Noun Phrases
نویسندگان
چکیده
A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer.
منابع مشابه
Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem
We propose a language-independent datadriven method to exhaustively extract bursty phrases of arbitrary forms (e.g., phrases other than simple noun phrases) from microblogs. The burst (i.e., the rapid increase of the occurrence) of a phrase causes the burst of overlapping Ngrams including incomplete ones. In other words, bursty incomplete N-grams inevitably overlap bursty phrases. Thus, the pro...
متن کاملof Referent of N o u n Phrases in Japanese Sentences
In machine translation and man-machine dialogue, it is important to clarify' referents of noun phrases. We present a method for determining the referents of noun phrases in Japanese sentences by using the referential properties, modifiers, and possessors 1 of noun phrases. Since the Japanese language has no articles, it is difficult to decide whether a noun phrase has an antecedent or not. We h...
متن کاملAn Estimate of Referent of Noun Phrases in Japanese Sentences
In machine translation and man-machine dialogue, it is important to clarify referents of noun phrases. We present a method for determining the referents of noun phrases in Japanese sentences by using the referential properties, modifiers, and possessors of noun phrases. Since the Japanese language has no articles, it is difficult to decide whether a noun phrase has an antecedent or not. We had ...
متن کاملSemantic Properties of (Non-)Floating Quantifiers and their Syntactic Implications1
It is well-known that, in Japanese, classifier phrases (ClPs) (e.g., 3 classifier) or measure phrases (MPs) (e.g., 3 liters) can ‘float’ in that they can be separated from the host noun, as in (1b) and (2b).2 ClPs and MPs in this configuration are referred to as floating quantifiers (FQs), and their non-floated counterparts in (1a) and (2a) in which ClPs or MPs are adjacent to their host noun a...
متن کاملSemantic Analysis of Japanese Noun Phrases - A New Approach to Dictionary-Based Understanding
This paper presents a new method of analyzing Japanese noun phrases of the form N1 no N2. The Japanese postposition no roughly corresponds to of, but it has much broader usage. The method exploits a definition of N2 in a dictionary. For example, rugby no coach can be interpreted as a person who teaches technique in rugby. We illustrate the effectiveness of the method by the analysis of 300 test...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011